Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Oversampling method for intrusion detection based on clustering and instance hardness
WANG Yao, SUN Guozi
Journal of Computer Applications    2021, 41 (6): 1709-1714.   DOI: 10.11772/j.issn.1001-9081.2020091378
Abstract336)      PDF (1211KB)(508)       Save
Aiming at the problem of low detection efficiency of intrusion detection models due to the imbalance of network traffic data, a new Clustering and instance Hardness-based Oversampling method for intrusion detection (CHO) was proposed. Firstly, the hardness values of the minority data were measured as input by calculating the proportion of the majority class samples in the neighbors of minority class samples. Secondly, the Canopy clustering approach was used to pre-cluster the minority data, and the obtained cluster values were taken as the clustering parameter of K-means++ clustering approach to cluster again. Then, the average hardness and the standard deviation of different clusters were calculated, and the former was taken as the "investigation cost" in the optimum allocation theory of statistics, and the amount of data to be generated in each cluster was determined by this theory. Finally, the "safe" regions in the clusters were further identified according to the hardness values, and the specified amount of data was generated in the safe regions in the clusters by using the interpolation method. The comparative experiment was carried out on 6 open intrusion detection datasets. The proposed method achieves the optimal values of 1.33 on both Area Under Curve (AUC) and Geometric mean (G-mean), and has the AUC increased by 1.6 percentage points on average compared to Synthetic Minority Oversampling TEchnique (SMOTE) on 4 of the 6 datasets. The experimental results show that the proposed method can be well applied to imbalance problems in intrusion detection.
Reference | Related Articles | Metrics
Fake news content detection model based on feature aggregation
HE Hansen, SUN Guozi
Journal of Computer Applications    2020, 40 (8): 2189-2193.   DOI: 10.11772/j.issn.1001-9081.2019122114
Abstract662)      PDF (845KB)(637)       Save
Concerning the problem that detection performance and generalization performance of the classification algorithm model in fake news content detection cannot be taken into account at the same time, a model based on feature aggregation was proposed, namely CCNN (Center-Cluster-Neural-Network). Firstly, the global temporal features of the text were extracted by bi-directional long and short term recurrent neural network, and the word or phrase features in the range of window were extracted by Convolutional Neural Network (CNN). Secondly, the feature aggregation layer based on dual center loss training was selected after the CNN pooling layer. Finally, the feature data of Bi-directional Long-Short Term Memory (Bi-LSTM) and CNN were stitched into a vector in the depth direction and provided to the fully connected layer. And the final classification result was output by the model trained by uniform loss function (uniform-sigmod). Experimental results show that the proposed model has an F1 value of 80.5%, the difference between training and validation sets is 1.3%. Compared with the traditional models such as Support Vector Machines (SVM), Naïve Bayes (NB) and Random Forest (RF), the proposed model has the F1 value increased by 9%-14%; compared with neural network models such as Long Short Term Memory (LSTM) and FastText, the proposed model has the generalization performance increased by 1.3%-2.5%. It can be seen that the proposed algorithm can improve the classification performance while ensuring a certain generalization ability, so the overall performance is enhanced.
Reference | Related Articles | Metrics